I've been training a model recently with a rather large dataset (0_gt_wavs are 1h10) and my Epochs are taking 43min on average. I'm running a gtx 1080 and my usage is looking like this: https://i.imgur.com/EE9SUXp.png
My training parameters:
'batch_size': 6, 'fp16_run': False, 'lr_decay': 0.999875, 'segment_size': 12800, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0}, 'data': {'max_wav_value': 32768.0, 'sampling_rate': 40000, 'filter_length': 2048, 'hop_length': 400, 'win_length': 2048, 'n_mel_channels': 125, 'mel_fmin': 0.0, 'mel_fmax': None, 'training_files': './logs\\model1/filelist.txt'}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [10, 10, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 4, 4], 'use_spectral_norm': False, 'gin_channels': 256, 'spk_embed_dim': 109}, 'model_dir': './logs\\model1', 'experiment_dir': './logs\\model1', 'save_every_epoch': 10, 'name': 'model1', 'total_epoch': 500, 'pretrainG': 'pretrained_v2/f0G40k.pth', 'pretrainD': 'pretrained_v2/f0D40k.pth', 'version': 'v2', 'gpus': '0', 'sample_rate': '40k', 'if_f0': 1, 'if_latest': 1, 'save_every_weights': '0', 'if_cache_data_in_gpu': 0}
Am I doing something obviously wrong? Is there a way to optimize my training parameters to reduce the epoch duration? I've previously trained something where the GPU usage was constantly at 100% and not fluctuating so much, but I can't remember which settings were different. It was definitely a smaller dataset.
And follow up: if there are parameters to change, how can I abort the current training and continue it with the modified parameters?
Thanks in advance!
submitted by /u/induna_crewneck
[link] [comments]